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Abstract 

In this work we analyze the mean-square performance of different strategies for distributed estimation 
over least-mean-squares (LMS) adaptive networks. The results highlight some useful properties for 
distributed adaptation in comparison to fusion-based centralized solutions. The analysis establishes that, 
by optimizing over the combination weights, diffusion strategies can deliver lower excess-mean-square- 
error than centralized solutions employing traditional block or incremental LMS strategies. We first study 
in some detail the situation involving combinations of two adaptive agents and then extend the results 
to generic TV-node ad-hoc networks. In the later case, we establish that, for sufficiently small step-sizes, 
diffusion strategies can outperform centralized block or incremental LMS strategies by optimizing over 
left-stochastic combination weighting matrices. The results suggest more efficient ways for organizing 
and processing data at fusion centers, and present useful adaptive strategies that are able to enhance 
performance when implemented in a distributed manner. 
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I. Introduction 

THIS work examines the dynamics that results when adaptive nodes are allowed to interact with 
each other. Through cooperation, some interesting behavior occurs that is not observed when the 
nodes operate independently. For example, if one adaptive agent has worse performance than another 
independent adaptive agent, can both agents cooperate with each other in such a manner that the 
performance of both agents improves? What if N agents are interacting with each other? Can all agents 
improve their performance relative to the non-cooperative case even when some of them are noisier than 
others? Does cooperation need to be performed in a centralized manner or is distributed cooperation 
sufficient to achieve this goal? Starting with two adaptive nodes, we derive analytical expressions for the 
mean-square performance of the nodes under some conditions on the measurement data. The expressions 
are then used to compare the performance of various (centralized and distributed) adaptive strategies. The 
analysis reveals a useful fact that arises as a result of the cooperation between the nodes; it establishes 
that, by optimizing over the combination weights, diffusion least-mean-squares (LMS) strategies for 
distributed estimation can deliver lower excess-mean-square-error (EMSE) than a centralized solution 
employing traditional block or incremental LMS strategies. We first study in some detail the situation 
involving combinations of two adaptive nodes for which the performance levels can be characterized 
analytically. Subsequently, we extend the conclusion to iV-node ad-hoc networks. Reference [2 1 provides 
an overview of diffusion strategies for adaptation and learning over networks. 

It is worth noting that the performance of diffusion algorithms was already studied in some detail in the 
earlier works [|3], El- These works derived expressions for the network EMSE and mean-square-deviation 
(MSD) in terms of the combination weights that are used during the adaptation process. The results in 
||3l , |H were mainly concerned in comparing the performance of diffusion (i.e., distributed cooperative) 
strategies with non-cooperative strategies. In the cooperative case, nodes share information with each 
other, whereas they behave independently of each other in the non-cooperative case. In the current work, 
we are instead interested in comparing diffusion or distributed cooperative strategies against centralized 
(as opposed to non-cooperative) strategies. In the centralized framework, a fusion center has access to all 
data collected from across the network, whereas in the non-cooperative setting nodes have access only 
to their individual data. Therefore, finding conditions under which diffusion strategies can perform well 
in comparison to centralized solutions is generally a demanding task. 

We start our study by considering initially the case of two interacting adaptive agents. Though struc- 
turally simple, two-node networks are important in their own right. For instance, two-antenna receivers are 
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prevalent in many communication systems. The data received by the antennas could either be transferred 
to a central processor for handling or processed cooperatively and locally at the antennas. Which mode 
of processing can lead to better performance and how? Some of the results in this article help provide 
answers to these questions. In addition, two-node adaptive agents can serve as good models for how 
estimates can be combined at master nodes that connect larger sub-networks together. There has also 
been useful work in the literature on examining the performance of combinations of two adaptive filters 
lHl-||9l- The main difference between two-node adaptive networks and combinations of two adaptive filters 
is that in the network case the measurement and regression data are fully distributed and also different 
across nodes, whereas the filters share the same measurement and regression data in filter combinations 
J5l-||9l. For this reason, the study of adaptive networks is more challenging and their dynamics is richer. 

The results in this work will reveal that distributed diffusion LMS strategies can outperform centralized 
block or incremental LMS strategies through proper selection of the combination weights. The expressions 
for the combination weights end up depending on knowledge of the noise variances, which are generally 
unavailable to the nodes. Nevertheless, the expressions suggest a useful adaptive construction. Motivated 
by the analysis, we propose an adaptive method for adjusting the combination weights by relying solely 
on the available data. Simulation results illustrate the findings. 

Notation: We use lowercase letters to denote vectors, uppercase letters for matrices, plain letters for 
deterministic variables, and boldface letters for random variables. We also use (-) T to denote transposition, 
(•)* for conjugate transposition, (•) for matrix inversion, Tr(-) for the trace of a matrix, p(-) for the 
spectral radius of a matrix, £g> for the Kronecker product, vec(A) for stacking the columns of A on top 
of each other, and diag(A) for constructing a vector by using the diagonal entries of A. All vectors in 
our treatment are column vectors, with the exception of the regression vectors, Uk ; i, which are taken to 
be row vectors for convenience of presentation. 

A. N on-Cooperative Adaptation by Two Nodes 

We refer to the two nodes as nodes 1 and 2. Both nodes are assumed to measure data that satisfy a 
linear regression model of the form: 

d k {i) = u k ,iW° + v k (i) (1) 

for k = 1,2, where w° is a deterministic but unknown M x 1 vector, dk(i) is a random measurement 
datum at time i, Uf- t i is a random 1 x M regression vector at time i, and Vk(i) is a random noise signal 
also at time i. We adopt the following assumptions on the statistical properties of the data {uk Ufc(i)}. 
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{di(i),Ui,i} «Jl,i-l {d 2 (l),W 2 ,,;} ™2,i-l 



Node 1) Z 1 (Node 2) Z 1 





Fig. 1. Nodes 1 and 2 process the data independently by means of two local LMS filters. 



Assumption I ( Statistical properties of the data ): 

1) The regression data it^ are temporally white and spatially independent random variables with zero 
mean and uniform covariance matrix R u k = Eit£ ^w^j > 0. 

2) The noise signals Vk(i) are temporally white and spatially independent random variables with zero 
mean and variances a 2 v k . 

3) The regressors uj.^ and noise signals vi(j) are mutually-independent for all k and I, i and j. ■ 
It is worth noting that we do not assume Gaussian distributions for either the regressors or the noise 
signals. We note that the temporal independence assumption on the regressors may be invalid in general, 
especially for tapped-delay implementations where the regressions at each node would exhibit a shift 
structure. However, there have been extensive studies in the stochastic approximation literature showing 
that, for stand-alone adaptive filters, results based on the temporal independence assumption, such as ([T2l 
and (fT3l further ahead, still match well with actual filter performance when the step-size is sufficiently 
small ifTOTl — lfT5Tl . Thus, we shall adopt the following assumption throughout this work. 

Assumption 2 (Small step-sizes): The step-sizes are sufficiently small, i.e., p, k <C 1, so that terms 
depending on higher-order powers of the step-sizes can be ignored, and such that the adaptive strategies 
discussed in this work are mean-square stable (in the manner defined further ahead). ■ 

We are interested in the situation where one node is less noisy than the other. Thus, without loss of 
generality, we assume that the noise variance of node 2 is less than that of node 1, i.e., 



2 2 
°v,2 < a vA 



(2) 



The nodes are interested in estimating the unknown parameter w°. Assume initially that each node k 
independently adopts the famed LMS algorithm |[T6l - |[T8l to update its weight estimate (as illustrated in 
Fig. [T]) according to the following rule: 

w k ,i = Wfc.i-i + A*jfei*jfe,i [dk(i) ~ u kji w kii -i] (3) 
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for k = 1,2, where p, k is a positive constant step-size parameter. The steady-state performance of an 

adaptive algorithm is usually assessed in terms of its mean-square error (MSE), EMSE, and MSD, which 
are defined as follows. If we introduce the error quantities: 

e k (i) = d k (i) - u kti w k)i _i (4) 

Wk,i = w° - w k ,i (5) 

e a ,k{i) - Uk,iWk,i-l (6) 

then the MSE, EMSE, and MSD for node k are defined as the following steady-state values: 

MSE fc = lim E|e fe (i)| 2 (7) 



EMSEfc = lim Ele a fc(i)| 2 (8) 



J— >0O 



MSD fc = lim E||wj fci || 2 (9) 

i— too ' 

where the notation || • || denotes the Euclidean norm of its vector argument. Substituting expression CQ) 
into the definition for e k (i) in (@]), it is easy to verify that the errors {e k (i), e atk {i)} are related as follows: 

ejfc(i) = e 0)fc (i) +v k (i) (10) 

for k = 1,2. Since the terms v k (i) and e ajk (i) are independent of each other, it readily follows that the 
MSE and EMSE performance measures at each node are related to each other through the noise variance: 

MSE fc = EMSE fc + a 2 k (11) 

Therefore, it is sufficient to examine the EMSE and MSD as performance metrics for adaptive algorithms. 
Under Assumption [2] the EMSE and MSD of each LMS filter in (0 are known to be well approximated 
by HUE!: 

(12) 



EMSE fc w 1 fi k al k Tr(R Utk ) 



and 

(13) 



MSD fc w - ii k a\ k M 



for k = 1, 2. To proceed, we further assume that both nodes employ the same step-size and observe data 
arising from the same underlying distribution. 

Assumption 3 (Uniform step-sizes and data covariance): It is assumed that both nodes employ identi- 
cal step-sizes, i.e., \i\ = p,2 = \i, and that they observe data arising from the same statistical distribution, 
i.e., R u i = R u o = R u - ■ 
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Under Assumption [3] expression ([121 confirms the expected result that node 2 will achieve a lower 
EMSE than node 1 because node 2 has lower noise variance than node 1. The interesting question that 
we would like to consider is whether it is possible to improve the EMSE performance for both nodes if 
they are allowed to cooperate with each other in some manner. The arguments in this work will answer 
this question in the affirmative and will present distributed cooperative schemes that are able to achieve 
this goal, not only for two-node networks but also for iV-node ad-hoc networks (see Sec. VI). 



II. Two Centralized Adaptive Algorithms 

One form of cooperation can be realized by connecting the two nodes to a fusion center, which 
would collect the data from the nodes and and use them to estimate w°. Fusion centers are generally 
more powerful than the individual nodes and can, in principle, implement more sophisticated estimation 
procedures than the individual nodes. In order to allow for a fair comparison between implementations 
of similar nature at the fusion center and remotely at the nodes, we assume that the fusion center is 
limited to implementing LMS-type solutions as well, albeit in a centralized manner. In this work, the 
fusion center is assumed to operate on the data in one of two ways. The first method is illustrated in 
Fig. [2a] and we refer to it as block LMS. In this method, the fusion center receives data from the nodes 
and updates its estimate for w° according to the following: 



w, 



Wi-l + fjf 



Ul,i 


* 


( 


di(i)" 






\ 
















«2,i 




K 


da(i) 




U 2 ,i 


J 



(14) 



with a constant positive step-size //. The second method is illustrated in Fig. [2b] and we refer to it as 
incremental LMS. In this method, the fusion center still receives data from the nodes but operates on 
them sequentially by incorporating one set of measurements at a time as follows: 

fa = Wi-i + y!u\ i [di(i) - ui^Wi-t] 

(15) 

Wi = fa + f/u^i [d 2 (i] - u 2 ,ifa] 
We see from (|T5T > that the fusion center in this case first uses the data from node 1 to update to 
an intermediate value fa, and then uses the data from node 2 to get wi. Method (031) is a special case 
of the incremental LMS strategy introduced and studied in 11251 and is motivated by useful incremental 
approaches to distributed optimization ||26l — II3TI . We observe from (fT4~b and (|T~5T > that in going from 
iWj-l to the block and incremental LMS algorithms employ two sets of data for each such update; 
in comparison, the conventional LMS algorithm used by the stand-alone nodes in ([3]) employs one set 
of data for each update of their respective weight estimates. 
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(a) Block LMS adaptation. 



{di (i) , «i lf } Fusion Center { d2 (j) _ u% .} 



Fusion 


<■ 


2" 1 




Fusion 


<6, ' 



(b) Incremental LMS adaptation. 
Fig. 2. Two centralized strategies using data from nodes at a fusion center. 



We define the EMSE and MSD for block LMS (QJ) and incremental LMS OS as follows: 

EMSE blk/inc = \ lim E||e aii || 2 
MSD blk/inc 4 lim EH^II 2 
where the a priori error e a j is now a 2 x 1 vector: 



(16) 
(17) 



(18) 



Note that in (fT6l ) we are scaling the definition of the EMSE by 1/2 because the squared Euclidean-norm 
in ( fT6l ) involves the sum of the two error components from (IT8V We shall explain later in Sec. VI that in 
order to ensure a fair comparison of the performance of the various algorithms (including non-cooperative, 
distributed, and centralized), we will need to set the step-size as (see d66l )) 



/i 



2 



(19) 



This normalization will help ensure that the rates of convergence of the various strategies that are being 
compared are similar. 

Now, compared to the non-cooperative method © where the nodes act individually, it can be shown that 
the two centralized algorithms (fl4T i and <HT5\ lead to improved mean-square performance (the arguments 
further ahead in Sec. IV-F establish this conclusion among several other properties). Specifically, the 
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EMSE obtained by the two centralized algorithms (fl4l ) and (fl"5T ) will be smaller than the average EMSE 
obtained by the two non-cooperative nodes in ©. The question that we would like to explore is whether 
distributed cooperation between the nodes can lead to superior performance even in comparison to 
the centralized algorithms (fT4T i and (PT5T l. To address this question, we shall consider distributed LMS 
algorithms of the diffusion-type from [|3], El, and which are further studied in Il32l - P0l . Reference ||2] 
provides an overview of diffusion strategies. Adaptive diffusion strategies have several useful properties: 
they are scalable and robust, enhance stability, and enable nodes to adapt and learn through localized 
interactions. There are of course other useful algorithms for distributed estimation that rely instead on 
consensus-type strategies, e.g., PT1 - P31 . Nevertheless, diffusion strategies have been shown to lead 
to superior mean-square-error performance in comparison to consensus-type strategies (see, e.g., [38], 
[39 ]). For this reason, we focus on comparing adaptive diffusion strategies with the centralized block and 
incremental LMS approaches. The arguments further ahead will show that diffusion algorithms are able 
to exploit the spatial diversity in the data more fully than the centralized implementations and can lead 
to better steady-state mean-square performance than the block and incremental algorithms ([Pfl i and ([I3V 
when all algorithms converge in the mean-square sense at the same rate. We shall establish these results 
initially for the case of two interacting adaptive agents, and then discuss the generalization for iV-node 
networks in Sec. VI. 

III. Adaptive Diffusion Strategies 

Diffusion LMS algorithms are distributed strategies that consist of two steps O-IHl: updating the 
weight estimate using local measurement data (the adaptation step) and aggregating the information from 
the neighbors (the combination step). According to the order of these two steps, diffusion algorithms can 
be categorized into two classes: Combine-then- Adapt (CTA) (as illustrated in Fig. [3ab : 

<t>k,i-i = aik w l,i-i + a2kW 2 ,i-i 

(20) 

Wk,i = 4>k,i-i + VkU* k i [d k (i) - Wfc,i0fc,i_i] 

V 

and Adapt-then-Combine (ATC) (as illustrated in Fig. [3bl : 

tpk,i = Wk,i-i + VkU* k ,i \ d k{i) ~ u k ,iW k)i -i] 



for k = 1, 2, where the {/J, k } are positive step-sizes and the {ai k } denote convex combination coefficients 
used by nodes 1 and 2. The coefficients are nonnegative and they satisfy 




(21) 



a xk > 



d2k > 0, 



+ a-2k — 1 



(22) 
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1-/3 I- a 




(a) CTA diffusion adaptation. 
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(Node l) 
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i^~cT — = *' 
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(b) ATC diffusion adaptation. 



Fig. 3. Two diffusion strategies using combination coefficients {a, 1 — a, f), 1 — /?}. 



for k = 1, 2. We collect these coefficients into a 2 x 2 matrix ^4 and denote them more compactly by 
{a, I — a} for node 1 and {1 — /3, /?} for node 2: 



011 


012 




a 


1-/3 


021 


022 




1 — a 


/3 



where a,/3 G [0,1]. Note that when a = ft = 1, both CTA algorithm (|20j and the ATC algorithm 
(|2TT) reduce to the non-cooperative LMS update given by ©; we shall exclude this case for diffusion 
algorithms. Observe that the order of adaptation and combination steps are different for CTA and ATC 
implementations. The ATC algorithm (|2TT > is known to outperform the CTA algorithm (l2Cfl l because the 
former shares updated weight estimates in comparison to the latter, and these estimates are expected to 
be less noisy; see the analysis further ahead and also 0, 0. 

An important factor affecting the mean-square performance of diffusion LMS algorithms is the choice 
of the combination coefficients a and /3. Different combination rules have been proposed in the literature, 
such as uniform, Laplacian, maximum degree, Metropolis, relative degree, relative degree-variance, and 
relative variance (which were listed in Table III of reference H; see also O, 11371 ). Apart from these static 
combination rules, where the coefficients are kept constant over time, adaptive rules are also possible. In 
the adaptive case, the combination weights can be adjusted regularly so that the network can respond to 
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real-time node conditions El. 01. 04l. 071. BUI. 

Now that we have introduced the various strategies (non-cooperative LMS, block LMS, incremental 
LMS, ATC and CTA diffusion LMS), we proceed to derive expressions for the optimal mean-square 
performance of the diffusion algorithms (1201 and (12TV The analysis will highlight some useful properties 
for distributed algorithms in comparison to centralized counterparts. For example, the results will establish 
that the diffusion strategies using optimized combination weights perform better than the centralized 
solutions (fl4l and (fl"5l ). Obviously, by assuming knowledge of the network topology, a fusion center can 
implement the optimized diffusion strategies centrally and therefore attain the same performance as the 
distributed solution. We are not interested in such situations where the fusion center implements solutions 
that are fundamentally distributed in nature. We are instead interested in comparing truly distributed 
solutions of the diffusion type (|20l and (f2TT > with traditional centralized solutions of the block and 
incremental LMS types (fl4l and (031 ): all with similar levels of LMS complexity. 

IV. Performance Analysis for Two-Node Adaptive Networks 

We rely on the energy conservation arguments |fT6l to conduct the mean-square performance analysis 
of two-node LMS adaptive networks. We first compute the individual and network EMSE and MSD for 
the CTA and ATC algorithms (l20l and (|2TT >. and then deal with the block and incremental algorithms (fl4l 
and ( fT51 ). The analysis in the sequel is carried out under Assumptions [Tj-[3] and condition ( fT9l ). Assumption 
[2] helps ensure the mean-square convergence of the various adaptive strategies that we are considering 
here — see, e.g., EJ-JH, ifToll . By mean-square convergence of the distributed and centralized algorithms, 
we mean that Ew^j — > 0, Ewi — > 0, and EHtw^H 2 and Ej|i5j|| 2 tend to constant bounded values as 
i — s> oo. In addition, Assumption [3] and condition ( fT9l ) will help enforce similar convergence rates for all 
strategies. 

A. EMSE and MSD for N on-Cooperative Nodes 

Under Assumptions [TJOJ and as mentioned before, it is known that the EMSE and MSD of the two 
stand-alone LMS filters in ©, which operate independently of each other, are given by 



EMSE ind , fc « v 



(24) 



and 

ual ,.M 

(25) 



MSD ind , fc « 
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for k = 1, 2. Using (|24l) . the average EMSE and MSD of both nodes are 



EMSEi n d 



/ilt(fl„) <1 + <2 



and 



MSD 



ind 



liM o* x + a\ 2 



(26) 



(27) 



S. EMSE and MSD for Diffusion Algorithms 

Rather than study CTA and ATC algorithms separately, we follow the analysis in |2), (4] and consider 
a more general algorithm structure that includes CTA and ATC as special cases. We derive expressions 
for the node EMSE and MSD for the general structure and then specialize the results for CTA and ATC. 
Thus, consider a diffusion strategy of the following general form: 



0Jfc,i-l =Plfctt>l,t-l +P2kW 2 ,i-l 

tpk,i = 4>k,i-l + (J>Uk,i i d k(i) - Wfc,i0fc,i-l] 



(28) 
(29) 
(30) 



where {pik, qik} w& the nonnegative entries of 2 x 2 matrices {P, Q}. The CTA algorithm d20b corresponds 
to the special choice P = A and Q = I 2 while the ATC algorithm (1211 corresponds to the special choice 
P = h and Q = A, where I 2 denotes the 2x2 identity matrix. From (l23l l. it can be verified that the 
eigenvalues of A are {1, a + f3 — 1}. In the cooperative case, we rule out the choice a = /3 = 1 so 
that the two eigenvalues of A are distinct and, hence, A is diagonalizable. Let A = TDT^ 1 denote the 
eigen-decomposition of A: 

1 1 

1-a /3-1 





a 1-/3 


1 


1-/3 


1 




1 







1-a p 


2-a-P 


1 — a 


-1 







a + /3 - 1 


A 

and let A m denote the 


T 

mth eigenvalue of R u 


whose size 


D 

is M x M. 



(31) 



following expression for the EMSE at node k: 

M 



EMSE diffjfc ~n 2 Y, ^ m vec{T T Q T R v QT) T (h - £ m D D^vec^- 1 E kk T- J ) 

m=l 

for k = 1, 2, where 

£ m = 1 - 2/xA m , i?^ = diag{<7^ 1 , <t^ 2 } 



(32) 



(33) 
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and E kk are 2 x 2 matrices that are given by 

E n = diag{l, 0}, E 22 = diag{0, 1} 
Likewise, we can derive the MSD at node k: 



(34) 



M 



MSD 



diff.fe ~ /u 2 Yl *mVec(T J Q T R v QT) T (h - t m D ® D^vec^ 1 E kk T- J ) (35) 



m=l 



for k = 1, 2. Comparing (1321 and (|33T l we note that \ 2 m in d32l is replaced by A m in (|35T ); all the other 
factors are identical. 



C. EMSE and MSD of CTA Diffusion LMS 

Setting Q = I 2 , we specialize (|32l to obtain the EMSE expression for the CTA algorithm: 



A I 



EMSE cta , fc w /i 2 J] X 2 m vec(T T R v T) T (h - UD D)-\ eC {T- 1 E kk T 



m=l 



(36) 

for A; = 1,2. Substituting (1311 into (1361 ). some algebra will show that the network EMSE for the CTA 
algorithm, which is defined as the average EMSE of the individual nodes, is given by 



EMSE,. 



M 

E 



2 \2 



fi X 



^(1 - /3) 2 + < 2 ( l-a) 2 + (a - (3)[al 2 (1 - a) - < x (l - /?)] 



+ 



l-£m l-e m (a + /3-l) 

«i+< 2 )[(l-«) 2 + (l-/3) 2 ]" 



(37) 



2[l-C m (a + /3-l) 2 ] 

We argue in Appendix |B] that, under Assumption |2] (i.e., for sufficiently small step-sizes), the network 
EMSE in (l37l) is essentially minimized when {a, /3} are chosen as 

(38) 

This choice coincides with the relative degree-variance rule proposed in [4"|J3 In the sequel we will 
compare the performance of the diffusion strategies that result from this choice of combination weights 
against the performance of the block and incremental LMS strategies (fT4l) and (fT31) . 
The value of d37l that corresponds to the choice (|38l l is then given by 




EMSE, 



■opt 



<l<2 /uTr(i4) . ^ 2 (<i+< 2 ) ^ . 2 



<1 + <2 



2«i+<: 



m=l 



(39) 



There is a typo in Table III of |4|, where the noise variances for the relative degree- variance rule should appear inverted. 
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and the corresponding EMSE values at the nodes are 



EMSE^, 



<i<2 vTr(R u ) 



a. 



v.l 



+ CT, 



+ 



v. 2 



<1 + <2 



M 



m=l 



for k = 1,2. Similarly, the network MSD is approximately minimized for the same choice 
value is given by 




(40) 



and its 



(41) 

The corresponding MSD values at the nodes are 

(42) 

for k = 1, 2. We shall refer to the CTA diffusion algorithm that uses (138) as the optimal CTA implemen- 
tation. Note that selecting the coefficients as in (I38T ) requires knowledge of the noise variances at both 
nodes. This information is usually unavailable. Nevertheless, it is possible to develop adaptive strategies 
to adjust the coefficients {a, /3} on the fly based on the available data without requiring the nodes to 
know beforehand the noise profile in the network (see ||2], 11371 . BD1 and (11031 ) and (11061 ) further ahead). 
We therefore continue the analysis by assuming the nodes are able to determine (or learn) the coefficients 

dsn. 




D. EMSE and MSD ofATC Diffusion LMS 

Likewise, setting Q = A, we specialize (l32l to obtain the EMSE expression for the ATC algorithm: 



EMSE 



ate, A; ~ M 



£ A 2 n vec(r T A T ^ J 4T) T (/ 4 - UD D)- 1 vec(r- 1 ^. fe T- T ) 



(43) 



m=l 



for k = 1, 2. Following similar arguments to the CTA case, the network EMSE is given by 



EMSE a 



M 

E 



n 2 K 



< (2 - a - PY 



(7^(1 - Pf + < 2 (1 - af 

1 ?m 

(a-f3)(a + f3- 2 (1 - a) - < x (l - g)] 
l-e ro (a + /3-l) 

«i + < 2 )(« + - 1) 2 [(1 " «) 2 + (1 " /?)T 



+ 



(44) 



2 [i - e m (« + p - m 

We can again verify that, under Assumption [2] expression (l44l) is approximately minimized for the same 
choice (l38l) (see Appendix |B]). The resulting network EMSE value is given by 



EMSE 



opt a v,l a v,2 liTr(R u ) 



a v,l * a v,2 



(45) 



March 8, 2013 



DRAFT 



14 



and the corresponding EMSE values at the nodes are 




(46) 

for k = 1, 2. Similarly, the network MSD is approximately minimized for the same choice ([38]); its value 
is given by 

(47) 




and the corresponding MSD values at the nodes are 

r 2 .rr 2 „ ,,A/T 

(48) 

for k = 1, 2. We shall refer to the ATC diffusion algorithm that uses (l38l l as the optimal ATC implemen- 
tation. 




E. Uniform CTA and ATC Diffusion LMS 

Uniform CTA and ATC diffusion LMS correspond to the choice a = ft = 0.5, which means that the 
two nodes equally trust each other's estimates. This situation coincides with the uniform combination 
rule H. According to (|37) and ([35V the network EMSE and MSD for uniform CTA are 

(49) 

*± \ z * — ■ / 

\ m=l / 

and 




MSD 



2 2/ 
unf a v,l 



Similarly, according to (j441 and (I351 l. the network EMSE and MSD for uniform ATC are 



emse: 



■unf a v,l + a v,2 nTr(Ru) 



and 



MSD. 



2 2 



(50) 



(51) 



(52) 
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F. EMSE and MSD of Block LMS and Incremental LMS 

In Appendix O we derive the EMSE and MSD for the block LMS implementation (fl4l and arrive at 



EMSE 



blk 



and 



MSD 



blk 



<l + <2 /i'M 



(53) 



(54) 



With regards to the incremental LMS algorithm (fl3T l. we note from Assumption [2] that the step-size \J 
is sufficiently small so that we can assume y'Ti{R u ) <C 1. Then, from (TT5l l we get 



Wi = Wi-i + ix[u\^{di{i) - Ui^Wi-i) + U2 )i {d,2{i) - W2,i«?i-l)] + ^ll"2,i|| 2tt l,j( rf l(0 - Ui.iWi- 





Ul,i 




f 


di(i)" 






\ 


Wi-i + y! 


















U 2 ,i 






d2(i) 




«2,i 


/ 



(55) 



which means that the incremental LMS update ( fT5T > can be well approximated by the block LMS update 
(fT4l) . Then, the EMSE of incremental LMS (fT5T ) can be well approximated by (reference [25 1 provides 
a more detailed analysis of the performance of adaptive incremental LMS strategies): 



EMSEj, 



<i + ^,2 /i'Tr(i? u ) 



and its MSD as 



MSD;, 



<1 + <2 M'M 



(56) 



(57) 



It is worth noting that although (1531 and (1561 ) are similar for small step-sizes, incremental LMS actually 
outperforms block LMS P4l because the former uses the intermediate estimate fa during one step of the 
update in ( fT5l ) while the latter does not. The intermediate estimate fa is generally "less noisy" than 
so that incremental LMS generally outperforms block LMS. However, we shall not distinguish between 
incremental LMS and block LMS in this work, when we compare their performance with other strategies 
in the sequel. 



G. Summary 

We list the expressions for the network EMSE and MSD for the various strategies under Assumptions 
Q3-I3] in Table H and the expressions for the individual nodes in Tables JI] and [TTTl respectively. It is 
worth noting from these expressions that the EMSE is dependent on the step-size parameter. In order 
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TABLE I 

Network EMSE and MSD for Various Strategies over Two-Node LMS Adaptive Networks 



Type 


Network EMSE 


Network MSD 


Optimal ATC (ED 


c l°"harm 




c l ^harrn 


{47} 


Optimal CTA © 


ClCTharm + C 2 (2<7 arth - of aml ) 


CS} 


/ 2 | / In 2 2 \ 
ClOharm + c 2 ^C"arth — (Tharai J 


ED 


Uniform ATC @D 


dearth 




Cl^arth 


m 


Uniform CTA ll20l 


(ci + 02)0^ 




+C 2 )cr a 2 rth 


us 


Block LMS 03 


2C3Cr^, h 




2c3CT a ^ th 


in 


Incremental LMS (Q3} 




(|56j 


2c3CT a ^ th 


{57} 


Stand-alone LMS © 


2ci<rJ th 


426} 


2c'i cr^ 


127} 



1 A = col{Ai, . . . , Ajv} consists of the eigenvalues of R u , = ' T^ '^ 1 2 ' T "' 2 and cr 2 ^ = ■ 

2 A |.Tr(ii„) A M 2 ||A|| 2 A M 'Tf(fl u ) / A M A/ / A M 2 Tr(fl„) , / A M 'M 
c l — 4 ' c 2 — 2 ' C3 — 4 ' Cl — 4~> 1=2 ~~ 2 ' anQ C 3 — "I - • 

TABLE II 

EMSE for the Individual Nodes in Various Strategies over Two-Node LMS Adaptive Networks 



Type 


EMSE of Node 1 


EMSE of Node 2 


Optimal ATC lE} 


Cl°farm O 


Cl Charm El} 


Optimal CTA {20} 


ClCTham, +C2-^ gO} 
CT arIh 


Cicr h 2 arm + C 2 -^- gO} 

CT arlh 


Stand-alone LMS {3} 


2cicr 2 ,i (|24} 


2cioi 2 GS 



TABLE III 

MSD for the Individual Nodes in Various Strategies over Two-Node LMS Adaptive Networks 



Type 


MSD of Node 1 


MSD of Node 2 


Optimal ATC (ED 






Optimal CTA (|20} 


cWarnn+C^ S3 
"ml, 




Stand-alone LMS {3} 


2ci<T^i E5} 


2ci< 2 (T25} 
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to compare the EMSE of the algorithms in a fair manner, the step-sizes need to be tuned appropriately 
because algorithms generally differ in terms of their convergence rates and steady-state performance. Some 
algorithms converge faster but may have larger EMSE. Others may have smaller EMSE but converge 
slower. Therefore, the step-sizes should be adjusted in such a way that all algorithms exhibit similar 
convergence rates. Then, under these conditions, the EMSE values can be fairly compared. We proceed 
to explain this issue in greater detail in the next section. 

V. Performance Comparison for Various Adaptive Strategies 

Adaptive algorithms differ in their mean-square convergence rates and in their steady-state mean- 
square error performance. In order to ensure a fair comparison among algorithms, we should either fix 
their convergence rates at the same value and then compare the resulting steady-state performance, or we 
should fix the steady-state performance and then compare the convergence rates. To clarify this procedure 
further, we consider the concept of "operation curves (OC)". 

A. Operation Curves for Adaptive Strategies 

The OC of an algorithm has two axes: the horizontal axis represents its EMSE and the vertical axis 
represents its (mean-square) convergence rate. Each point on the OC corresponds to a choice of the 
step-size parameter. Now the EMSE and convergence rate of an adaptive implementation, such as stand- 
alone LMS, are both dependent on the step-size parameter used by the algorithm. For example, under 
Assumptions [l]-[3j the EMSE of a stand-alone LMS filter of the type ([3]), denoted by is a function 
of \i and is given by [16]: 

C(p) « (58) 

The function ((fi) is monotonically increasing in fi. It is clear from (158T ) that the smaller the value of fi, 
the lower the EMSE (which is desirable). However, a smaller step-size \x results in slower mean-square 
convergence. This is because, under Assumptions [T3-|3] the modes of convergence for a stand-alone LMS 
implementation (0 are approximately given by [fl2] p. 360]: 

U = 1 - 2fi\ m (59) 

for m = 1, ... , M, where the {A m } are the eigenvalues of R u . The value of £ m that is closest to the 
unit circle determines the rate of convergence of Eu5j and E||tu, || 2 towards their steady-state values. It 
is clear from (|59l that the smaller p, is, the closer the mode is to the unit circle, and the slower the 
convergence of the algorithm will be. Hence, under Assumption |2l 
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• Increasing p, results in faster convergence at the cost of a higher (worse) EMSE. 

• Decreasing \i results in slower convergence but a lower (better) EMSE. 

For this reason, in order to compare fairly the performance of various algorithms, we need to jointly 
examine their EMSE and convergence rates. It is worth noting that the concept of operation curves 
can also be applied to other steady-state performance metrics such as the MSD. However, due to space 
limitations, we focus on the EMSE in this work. 

1 ) Operation Curve for Stand-Alone LMS: For stand-alone LMS filters, under Assumptions [T3-|3] the 
average network EMSE is given by (l26l i and the dominant mode (the one that is closest to the unit circle) 
is given by 

mode ind « 1 - 2p\ miD (R u ) (60) 

where Amin(-) denotes the smallest eigenvalue of its matrix argument. 

2) Operation Curve for CTA Diffusion LMS: Based on Assumptions [I|-[3l the expressions for the 
network EMSE of optimal CTA and uniform CTA are given by (|39l l and d49l ), respectively. Meanwhile, 
from expression (43) in |@], we know that the modes of convergence for CTA algorithms are determined 
by the eigenvalues of [A ® (Im — ^Ru)\ ® [A <8> (Ijm — pR u )]- Now recall from (OTT i that A has two real 
eigenvalues at {1, a + /3 — 1}. The second eigenvalue is smaller than one in magnitude. Therefore, the 
dominant mode for CTA algorithms is given by 

mode°£ mode™' wl- 2fi\ min (R u ) (61) 

3) Operation Curve for ATC Diffusion LMS: Based on Assumptions [OS the network EMSE for 
optimal ATC and uniform ATC are given by (l45l and (1511 . respectively. The modes of mean-square 
convergence for ATC algorithms are also determined by the eigenvalues of [A (g) (Im — pR^)) <S> [A <g> 
(Im — fJ-Ru)] 14]. Therefore, the dominant mode for ATC is also 

mode^ « mode™ 1 si- 2[i\ min (R u ) (62) 

4) Operation Curves for Block LMS and Incremental LMS: The EMSE for block LMS and incremental 
LMS are given by (|53l and (|56l ), respectively. In Appendix |Cj we show that their dominant mode is 

mode b ik w mode inc wl - A^'X mm (R u ) (63) 

We plot the operation curves for all algorithms in Fig. 5] From the figure we observe that (i) optimal 
ATC and optimal CTA have similar performance and outperform all other strategies; (ii) block LMS 
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and incremental LMS have similar performance to uniform ATC and uniform CTA; (iii) the non- 
cooperative stand-alone LMS implementation has the worst performance. In the following, we shall fix 
the convergence rate at the same value for all strategies and then compare their EMSE levels analytically. 

B. Common Convergence Rate 

As was mentioned before, the performance of each algorithm is dictated by two factors: its steady-state 
EMSE and its mean-square convergence rate, and both factors are functions of the step-size //. In order 
to make a fair comparison among the algorithms, we shall fix one factor and then compare them in terms 
of the other factor, and vice versa. 

From (l60l i. (l6Tb . and (l62l ). we know that ATC algorithms, CTA algorithms, and stand-alone LMS filters 
have (approximately) the same dominant mode for mean-square convergence: 

modei = 1 - 2/xA min ( J R„) (64) 

For block LMS and incremental LMS, from (l63l l. their dominant mode of convergence is approximately 

mode 2 = 1 - VA min ( J R n ) (65) 

In order to make all algorithms converge at the same rate, i.e., modei = mode2, we enforce the relation: 

(66) 



An intuitive explanation for (l66l ) is that, for a set of data {di(i),d,2(i);ux t i,U2 J i}, incremental LMS 
performs two successive iterations while each stand-alone LMS filter performs only one iteration. For 
this reason, the step-size of incremental LMS needs to be half the value of that for stand-alone LMS 
in order for both classes of algorithms to converge at the same rate. Based on condition (l66l) . we now 
modify Table U into Table [TV] and proceed to compare the EMSE for various strategies. Although we 
focus on comparing the EMSE performance, similar arguments can be applied to the MSD performance 
of the various strategies. 

C. Comparing Network EMSE 

We use Table II V|to compare the network EMSE. First, the harmonic and arithmetic means of {o~1 1 , 2 } 



are defined as 



2 a 2<7 v,l a v,2 2 A V* 1 + v > 2 tcn\ 

°harm — ~ T 2~i ^arth — 9 \° ' ) 

a vA "r a v,2 Z 
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l 

0.995 

0.99- 
0.985 - 

0.98 - 
0.975 - 

0.97- 
0.965 

0.96 - 
0.955 - 



Uniform ATC & CTA, 
Block;& Incremental LMS 




-e-OptimalATC(45) 
-©-Uniform ATC (51) 
-B- Optimal CTA (39) 
-□-Uniform CTA (49) 
-A- Stand-alone LMS (26) 
-y- Block/Incremental LMS (53)/(56) 



-50 -45 -40 

EMSE in dB 



Fig. 4. Operation curves for various algorithms when M = 10, R u = Im, = 0.01, and <t^ 2 = 0.001. 



TABLE IV 

Network EMSE values from Table[I]using fi = 2fi' 



Type 


Network EMSE 


Acronym 


Opt. ATC ED 


ClO-jLn (EUl 


emseS 


Opt. CTA (HO} 


ClCThann + C 2 (2<T^. th - &hmn) OH 


emsbS 


Unf. ATC lE} 




EMSE"," f 


Unf. CTA (HO} 


{ci+c 2 )al, h (|49} 


EMSE"," f 


Blk. LMS d) 


ClCTarth (HU 


EMSEbik 


Inc. LMS O 


ClCT^h l|56j 


EMSEinc 


Std. LMS (H 


2ciaSth <HU» 


EMSEi n( j 



o-arth = — ^ — and a hm . m = ^ • + j , where o^ 2 < cr; 4 . 

2 C1 andC2 ami^;' 1 



and it holds that cr^ arm < cr^, h . As a result, it is easy to verify that 

and 



EMSE^ < EMSE m 



-unf 



unf 



EMSE atc < EMSE cta 



(68) 



although their values are close to each other since C2 is proportional to fi 2 and /x is assumed to be 
sufficiently small by Assumption 12 For CTA-type algorithms, it is further straightforward to verify that 



EMSE^ < EMSE;;' < EMSE ind 



unf 



(69) 
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TABLE V 

Comparing Network EMSEs for Various Algorithms 





Opt. ATC 


Opt. CTA 


Unf. ATC 


Unf. CTA 


Blk./Inc. LMS 1 


Std. LMS 


Opt. ATC 




better 1 


better 


better 


better 


better 


Opt. CTA 


worse 3 




better 2 


better 


better 2 


better 


Unf. ATC 


worse 


worse 2 




better 3 


equal 


better 


Unf. CTA 


worse 


worse 


worse 3 




worse 3 


better 


Blk./Inc. LMS 1 


worse 


worse 2 


equal 


better 3 




better 


Std. LMS 


worse 


worse 


worse 


worse 


worse 





1 The step-sizes for block and incremental LMS are half the value for the other algorithms. 

2 If 2/xct 2 < 2 th - «rLm)/( 2cr Sth - °iLn)> which is generally true under Assumption E] 

3 By a small margin on the order of /i 2 . 



since, under Assumption |2] 

C2 < Ci C 2 (al th - CJharm) < Ci(cr^ h - CTj^J 

= 5> Cl^harm + c 2Carth < Cl^arth + c 20"harm 
=> c l CT harm + 2c 2^h < ( c l + c 2)o"ith + c 2<>harm 
=^ Cl^harm + c 2(2c>ith ~ °"harm) < ( c l + ^VLh 

Similarly, for ATC-type algorithms, we get 



EMSE^c < EMSE^tc < EMSE ind 



unf 



(70) 



The relation between optimal CTA and uniform ATC depends on the parameters {of^, c arth , c\ , c 2 } since 



EMSE^ < EMSE atc 



unf 



£2 CT^h ~ ^harm 



ci " 2<rf - a 2 



(VI) 

' arth " harm 

which is usually true under Assumption |2] Uniform ATC, block LMS, and incremental LMS have the 
same performance: 

' EMSE^fl (72) 



EMSEwk = EMSEi, 
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Hence, optimal ATC outperforms block LMS and incremental LMS: 



EMSE° P C ' < EMSE blk/inc 



(73) 



The relations between optimal CTA, block LMS, and incremental LMS also depend on the parameters 

Kann^arth> C l> C 2} since 



EMSC < EMSE blk/inc <!=>-< 



C2 



cr. 



arth 



harm 



Ci 2a 



arth 



CT, 



harm 



which is the same condition as (|7T1) . Block LMS and incremental LMS outperform uniform CTA: 



EMSE 



blk/inc 



< EMSE, 



unf 



(74) 



(75) 



but only by a small margin since C2 is proportional to /j, 2 . We summarize the network EMSE relationships 
in Table [V] Entries of Table [V] should be read from left to right. For example, the entry (in italics) on 
the second row and third column should be read to mean: "optimal ATC is better than optimal CTA (i.e., 
it results in lower EMSE)". 



D. Comparing Individual Node EMSE 

We compare the EMSE of node 1 under various strategies using Table UU. First, node 1 in optimal 
ATC outperforms that in optimal CTA: EMSE^ < EMSE^. But more importantly, node 1 in optimal 
CTA outperforms that in stand-alone LMS because EMSEi nc j,i < EMSE^'j when c 2 < c\, which is true 
under Assumption [2] Recall that node 1 has larger noise variance than node 2. Therefore, ATC and CTA 
cooperation helps it attain better EMSE value than what it would obtain if it operates independently. 

Likewise, we compare the EMSE of node 2 using Table |n] Node 2 in optimal ATC performs better 
than that in optimal CTA: EMSE° pt 2 < EMSE° pt 2 . Again, and importantly, node 2 in optimal CTA 
outperforms that in stand-alone LMS because EMSE^'j < EMSE;,,^ when c% < ci, which is again true 
under Assumption |2] Although node 2 has less noise than node 1 , it still benefits from cooperating with 
node 1 and is able to reduce its EMSE below what it would obtain if it operates independently. 

The relations between the EMSE for both nodes 1 and 2 under various strategies are the same — node 
2 always outperforms node 1 due to the lower noise level. Table [VI] summarizes the results. 

E. Simulations Results 

We compare the network EMSE for various strategies in Fig. |5] The length of w° is M = 10 and 
its entries are randomly selected. The regression data {wfc,i} and noise signals are i.i.d. white 

Gaussian distributed with zero mean and R u = Im, o 2 x = 0.01, and a 2 2 = 0.002. The results are 
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TABLE VI 

Comparing Individual Node EMSE Values for Various Strategies 





Opt. ATC 


Opt. CTA 


Std. LMS 


Opt. ATC 




better 


better 


Opt. CTA 


worse 




better 


Std. LMS 


worse 


worse 





- 

-5 - 
-10 - 
-15 -■■ 
-20 - 
-25 - 
-30 - 
-35-Ji- ■ 
-40Q., 



Similar convenience; rate 



■k- 




•* 

■ e — ©■ 



-B- Optimal ATC, simulation 

■ B ■ Optimal ATC, theory (45) 
-©- Optimal CTA, simulation 

■ ©■ Optimal CTA, theory (39) 
Uniform ATC, simulation 

■ Uniform ATC, theory (51) 
Uniform CTA, simulation 

■ £ ■ Uniform CTA, theory (49) 
-A-Block LMS, simulation 

■ A- Block LMS, theory (53) 

Incremental LMS, simulation 
Incremental LMS, theory (56) 
-^-Stand-alone LMS, simulation 
Stand-alone LMS, theory (26) 





500 
Iteration 



Fig. 5. Comparison of network EMSE when M = 10, R u = hi, &v,i = 0.01, ol a = 0.002, and fi = 0.01. 



averaged over 500 trials. From the simulation results, we can see that although centralized algorithms 
like (fl4l ) and (IT5b can offer a better estimate than the non-cooperative LMS algorithms ©, they can be 
outperformed by the diffusion strategies (l20l and (|2T1 . When the combination coefficients of ATC or 
CTA algorithms are chosen according to the relative degree-variance rule (|38l l, these diffusion strategies 
can achieve lower EMSE by a significant margin. In addition, we compare the EMSE of each node in 
the network for various strategies in Figs. |6a] - |6b| 

VI. Performance of TV-Node Adaptive Networks 

In the previous sections, we focused on two-node networks and were able to analytically characterize 
their performance, and to establish the superiority of the diffusion strategies over the centralized block 
or incremental LMS implementations. We now extend the results to A r -node ad-hoc networks. First, we 
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Optimal ATC, simulation 
Optimal ATC, theory (46) 
Optimal CTA, simulation 
Optimal CTA. theory (40) 
Stand alone LMS. simulation 
Stand-alone LMS, theory (24) 
Block LMS. simulation 
Block LMS. theory (53) 
Incremental LMS. simulation 
Incremental LMS, theory(56) 













'■■-«« ■ 





100 200 300 400 500 600 700 800 900 1000 

Iteration 



10 • a 




-40 

20 - 










850 : 9f 







- Optimal ATC, simulation 

■ Optimal ATC, theory (46) 

- Optimal CTA, simulation 

■ Optimal CTA, theory (40) 
-Stand-alone LMS, simulation 

■ Stand-alone LMS, theory (24) 
-Block LMS, simulation 

■ Block LMS, theory (53) 
-Incremental LMS, simulation 

■ Incremental LMS, theory(56) 




100 200 300 400 500 600 700 800 900 1000 

Iteration 



(a) Node 1. (b) Node 2. 

Fig. 6. Comparison of individual EMSE when M = 10, R u = hi, a* a = 0.01, ol 2 = 0.002, and fi = 0.01. 



establish that for sufficiently small step-sizes and for any doubly-stochastic combination matrix A, i.e., 
its rows and columns add up to one, the ATC diffusion strategy matches the performance of centralized 
block LMS. Second, we argue that by optimizing over the larger class of left-stochastic combination 
matrices, which include doubly-stochastic matrices as well, the performance of ATC can be improved 
relative to block LMS. Third, we provide a fully-distributed construction for the combination weights in 
order to minimize the network EMSE for ATC. We illustrate the results by focusing on ATC strategies 
but they apply to CTA strategies as well. 

Thus, consider a connected network consisting of ./V-nodes. Each node k collects measurement data 
that satisfy the linear regression model ([TJ. The noise variance at each node k is a% k . We continue to 
use Assumptions [TJ-|3l Each node k runs the following ATC diffusion strategy |@] : 

ipk,i = Wk,i-i + ^u* k i [d k (i) - u k)i w kji -i] 

yr i (76) 

Wk,i = 2^ a lkWl,i 

where denotes the positive weight that node k assigns to data arriving from its neighbor I; these 
weights are collected into an N x N combination matrix A, and A4 consists of all neighbors of node k 
including k itself. The weights {a/^} satisfy the following properties: 

aik = L aik > if I G Af k , and a ik = if I $ M k (77) 

leAfk 
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A. EMSE and MSD for ATC Diffusion LMS 

Observe that A is a left-stochastic matrix (the entries on each of its columns add up to one). Let 
A = TDT- 1 denote the eigen-decomposition of A, where T is a real and invertible matrix and D is 
in the real Jordan canonical form [45 1, P6l . We assume that A is a primitive/regular matrix, meaning 
that there exists an integer m such that all entries of A m are strictly positive P31 . B71 . This condition 
essentially states that for any two nodes in the network, there is a path of length m linking them. Since we 
assume a connected network and allow for loops because of (1771 ), it follows that A satisfies the regularity 
condition |2), P31 . J47j. Then, from the Perron-Frobenius theorem [45], [47 1, the largest eigenvalue in 
magnitude of A is unique and is equal to one. Therefore, D has the following form: 

1 



D 



J 



(78) 



where the N — 1 x N — 1 matrix J consists of real stable Jordan blocks. From Appendix |A] the network 
EMSE and MSD for ATC diffusion are given by 

2 M 



EMSE a 



fj_ 

N 



N 



MSD atc 

where £ m is given by ( f59l ) and 



\ 2 m vec{D T T T R v TD) T (I N 2 - £ m D J D)- 1 vec(T- 1 T- T ^ 

m=l 
2 M 

\ m vec(D T T T R V TD) T (I N 2 - £ m D ® J D)" 1 vec(T" 1 T" T ) 

m=l 



R v = diag{af i,. . . ,o- 2 n} 



(79) 
(80) 

(81) 



From (|59]> and (f78l . we get 



H(In* ~ UD D) 1 



(2A K 



nil - uj ® jy 1 



(82) 



where, to simplify the notation, we are omitting the subscripts of the identity matrices. Since J is stable 
and < £ m < 1 under Assumption |2j we have 



fi(I ~ UJ)' 1 =»{I-J + 2/iA m J)- 1 w n{I - J)- 1 = O(n) 
K 1 - imJ <8 J)' 1 = fJ<(I - J <8) J + 2/xA m J (8) J) -1 w - J <g> J) -1 



o( M ) 



(83) 
(84) 
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Therefore, by Assumption [2] we can ignore all blocks on the diagonal of (l82l with the exception of the 
left-most corner entry so that: 

H(In* ~ UD <8> D)- 1 « (2\ m )- 1 E ll ® E u 



(85) 



where En now denotes the N x N matrix given by En = diagjl, 0,0,..., 0}. Then, 
Hvec{D T T T R v TD) T (I N 2 - £ rn D <g> D)~ 1 vec(T~ 1 T~ T ) 

« vec{D T T T R v TD) T [(2X m )- 1 E 11 ® #ii]vec(r- 1 T- T ) 
= (2A m )- 1 vec( J R„) T (r J Bi 1 r- 1 OT J E 11 T~ 1 )vec(/jv) 



(86) 



where we used the fact that DE%\ = -^n because of d78l ) and vec(ABC) = (C T £x)^4)vec(l?) for matrices 
{A, £?,C} of compatible dimensions. Now, note that TEnT -1 is a rank-one matrix detemiined by the 
outer product of the left- and right-eigenvectors of A corresponding to the unique eigenvalue at one. 
Since A is left-stochastic, this left-eigenvector can be selected as the all-one vector 1, i.e., A T 1 = 1. Let 
us denote the right-eigenvector by y and normalize its element-sum to one, i.e., Ay = y and y T l = 1. 
It follows from the Perron-Frobenius theorem P31 . lETTI that all entries of y are nonnegative and located 
within the range [0, 1]. We then get TEnT^ 1 = yt T . Thus, from ([86), the network EMSE g9j can be 
rewritten as 



EMSE atc « ^^-vec(R v ) T {yt T ® yl T )vec(/7v) 



2N 



fiTi(R u ) T , .j. |v 

— — — vec(i?„) vec(yl ly ) 



That is, 



EMSEatc « ^A y T Rvy 



Similarly, the network MSD (1801 ) can be rewritten as 



MSD atc « ^-y J R v y 



(87) 



(88) 



(89) 



B. EMSE and MSD for Block and Incremental LMS 

For iV-nodes, the block LMS recursion ([141 ) is replaced by 



(90) 



k=l 
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and the incremental LMS recursion <TT3T > is replaced by 

for every i; 

initialize with ■j/jq = 



for every k = 1, 2, . . . , JV, repeat : 



(91) 



set tOj = i/>tv 
end 

In order for block and incremental LMS to converge at the same rate as diffusion ATC, we must set 
their step-sizes to // = fi/N (compare with d66j). Following an argument similar to the one presented 
in Appendix we can derive the EMSE and MSD for the block LMS strategy d90j as 



EMSEbik 



fiTr(R u ) Tr(R v ) 
2 N 2 



and 



MSD blk 



fiM Tr(R v 
~2 W 



(92) 



(93) 



respectively. A similar argument to (155) (see also expression (84) in [25]) leads to the conclusion that 
the performance of incremental LMS (191! can be well approximated by that of block LMS for small 

I »Tr(R u ) Tv(R v ) \ ^ 



step-sizesj. Therefore, 



EMSE;, 



N 2 



and 



MSDi, 



fj,M Tr(R v 
~2 W 



(95) 



For this reason, we shall not distinguish between block LMS and incremental LMS in the sequel. 



C. Comparing Network EMSE 

Observe that the EMSE expression (l8~8l l for ATC diffusion LMS and (|92l ) for block and incremental 
LMS only differ by a scaling factor, namely, y T R v y versus Tr(i?„) /N 2 . Then, ATC diffusion would 

2 Again, we remark that in general incremental LMS outperforms block LMS [44]; however, their performance are similar 
when the step-size is sufficiently small 1251 App. A]. 
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outperform block LMS and incremental LMS when 

y T R v y < ^ (96) 

where R v is diagonal and given by (UTTl ). We assume that the noise variance of at least one node in 
the network is different from the other noise variances to exclude the case in which the noise profile is 
uniform across the network (in which case R v would be a scaled multiple of the identity matrix). Thus, 
note that, if we select the combination matrix A to be doubly-stochastic, i.e., At = t and A T t = t, 
then it is straightforward to see that y = t/N so that 

y J R v y = ^ (97) 

This result means that, for sufficiently small step-sizes and for any doubly-stochastic matrix A, the 
EMSE performance of ATC diffusion and block LMS match each other. However, as indicated by dTTT i, 
the diffusion LMS strategy can employ a broader class of combination matrices, namely, left-stochastic 
matrices. If we optimize over the larger set of left-stochastic combination matrices and in view of d97l i, 
we would expect 

EMSEwk « EMSE atc (A doubly - stochastic ) > EMSE atc (A opt ) (98) 
where ^4 opt is the optimal combination matrix that solves the following optimization problem: 



A v = argmm y R v y 



subject to Ay = y, t T y = 1 
where A denotes the set consisting of all N x N left-stochastic matrices whose entries {aik} satisfy the 
conditions in {77]). We show next how to determine left-stochastic matrices that solve (|99l . 

First note that the optimization problem (|99l is equivalent to the following non-convex problem: 



minimize y T R v y 

AeA, J/GR+ (100) 
subject to Ay = y, t T y = 1 
where M + denotes the N x 1 nonnegative vector space. We solve this problem in two steps. First we 
solve the relaxed problem: 

minimize y T R v y 
subject to t T y = 1 

Since R v is positive definite and diagonal, the closed-form solution for (llOll i is given by 



y a R v 



t T RvH 



(102) 
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Next, if we can determine a primitive left-stochastic matrix A whose right eigenvector associated to 
eigenvalue 1 coincides with y°, then we would obtain a solution to (11001 ). Indeed, note that any primitive 
left-stochastic matrix A can be regarded as the probability transition matrix of an irreducible aperiodic 
Markov chain (based on the connected topology and condition (1771 ) on the weights) [48], fl49l . In that 
case, a vector y° that satisfies Ay° = y° would correspond to the stationary distribution vector for the 
Markov chain. Now given an arbitrary vector y°, whose entries are positive and add up to one, it is 
known how to construct a left-stochastic matrix A that would satisfy Ay° = y°. A procedure due to 
Hastings [50] was used in [51] to construct such matrices. Applying the procedure to our vector y° given 
by (11021 ). we arrive at the following combination rule, which we shall refer to as the Hastings rule (we 
may add that there are many other choices for A that would satisfy the same requirement Ay° = y°): 



Hastings rule: 



a-lk 



a 



v,k 



max{|A4|< fc ,|A^|a2 J 

1 - aik ' 
lehf k \{k} 



, leM k \{k} 
I = k 



(103) 



where |A4| denotes the cardinality of A4- It is worth noting that the Hastings rule is & fully-distributed 
solution — each node k only needs to obtain the degree- variance product (\J\fi \ — l)cr^ , from its neighbor 
I to compute the corresponding combination weight a^. By using the Hastings rule d 103b . the vector y° 
in d 1021 > is attained and the EMSE expression d88l ) is therefore minimized. The minimum value of 
is then given by 

1 



EMSET « ^ T{Ru) y^R v y° = 



(104) 



2 TrOO 

Compared to the EMSE of block and incremental LMS d92l and (|94l , we conclude that diffusion strategies 
using the Hastings rule (11031 ) achieve a lower EMSE level under Assumption [2] This is because, from 
the Cauchy-Schwarz inequality lH6l . we have 



N 2 < Tt(R v )Tt(R- 



1 



< 



Tr(i^) 



(105) 



when the entries on the diagonal of R v are not uniform (as we assumed at the beginning of this subsection). 

In real applications, where the noise variances are unavailable, each node can estimate its own noise 
variance recursively by using the following iteration: 

^v,k( i ) = (1 - Vk^v^ii ~ 1) + Vk\d k (i) - Uk,iWk,i-i\ 2 (106) 

Remark: In the two-node case, we determined the combination weights (|38l ) by seeking coefficients 
that essentially minimize the EMSE expressions d37l ) and (1441 . The argument in Appendix [B] expressed 
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(a) Network topology and noise profile. 
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Fig. 7. Simulated EMSE curves and theoretical results for ATC diffusion versus block LMS for a network with = 20 nodes. 



the EMSE as the sum of two factors: a dominant factor that depends on /i and a less dominant factor 
that depends on higher powers of fi. In the iV-node network case, we instead used the small step-size 
approximation to arrive at expressions (l88l and ([89) , which correspond only to the dominant terms in the 
EMSE and MSD expressions and depend on fi. We can regard (l88l l and (l89l as first-order approximations 
for the performance of the network for sufficiently small step-sizes. 



D. Simulation Results 

We simulate ATC diffusion LMS versus block LMS over a connected network with N = 20 nodes. 
The unknown vector w° of length M = 3 is randomly generated. We adopt R u = Im, H = 0.005 for ATC 
diffusion LMS, and \J = fi/N = 0.00025 for block LMS. The network topology and the profile of noise 
variances {a^ k } are plotted in Fig. [7a] For ATC algorithms, we simulate three different combination rules: 
the first one is the (left-stochastic) adaptive Hastings rule (11031 ) using (11061 ) and without the knowledge 
of noise variances, the second one is the Hastings rule (11031 ) with the knowledge of noise variance, and 
the third one is the (doubly-stochastic) Metropolis rule ll52l (which is a simplified version of the 
Hastings rule): 



Metropolis rule: 





f 1 


1 G M k \{k} 




max{|A4|,|A^|}' 


aik = < 


1 - ^2 a lk , 






I = k 




leAT k \{k} 





(107) 
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We adopted z/& = 0.1 and \i = 0.0054 for the adaptive Hastings rule (1103b — (1106b to match the convergence 
rate of the other algorithms. We also consider the non-cooperative LMS case for comparison puiposes. 
The EMSE learning curves are obtained by averaging over 50 experiments and are plotted in Fig. |7b] 
It can be seen that ATC diffusion LMS with Metropolis weights exhibits almost the same convergence 
behavior as block LMS in transient phase and attains a steady-state value that is less than 1 dB worse 
than block LMS. In comparison, ATC diffusion LMS using adaptive Hastings weights (where the noise 
variances are estimated through (11061 has almost the same learning curve as ATC using Hastings weights 
with the knowledge of the noise variances; both of them are able to attain about 7 dB gain over block 
LMS at steady-state. 

VII. Conclusion 

In this work we derived the EMSE levels for different strategies over LMS adaptive networks and 
compared their performance. The results establish that diffusion LMS strategies can deliver lower EMSE 
than centralized solutions employing traditional block or incremental LMS strategies. We first studied 
the case of networks involving two cooperating nodes, where closed-form expressions for the EMSE 
and MSD can be derived. Subsequently, we extended the conclusion to generic iV-node networks and 
established again that, for sufficiently small step-sizes, diffusion strategies can outperform centralized 
block LMS strategies by optimizing over left-stochastic combination matrices. It is worth noting that 
although the optimized combination rules rely on knowledge of the noise statistics, it is possible to 
employ adaptive strategies like (11061 ) to adjust these coefficients on the fly without requiring explicit 
knowledge of the noise profile — in this way, the Hastings rule d 1031 > can be implemented in a manner 
similar to the adaptive relative variance rule 10, ll37l . lElOll . Clearly, the traditional block and incremental 
implementations d90b and d9~TT ) can be modified to incorporate information about the noise profile as well. 
In that case, it can be argued that diffusion strategies are still able to match the EMSE performance of 
these modified centralized algorithms. 

Appendix A 

EMSE Expression for General Diffusion LMS with A-Nodes 

Under Assumptions Q3-I31 the EMSE expression for node k of the general diffusion strategy (l28T)-(l30l) 
is given by Eq. (39) from reference [4| (see also 0): 

EMSE fc [vec(y T )} T {I N 2 M 2 - J) _1 vec(^ fcfc R u ) (108) 
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where 

y = fj 2 (Q T R v Q)0R u (109) 

T = B T ®B* (110) 

B = (Q T P T ) ® (hi - vRu) (111) 

and Ekk = diag{0, . . . , 0, 1, 0, . . . , 0} is an N x N all-zero matrix except for the Mi entry on the diagonal, 
which is equal to one. Since for ATC algorithms, P = In and Q = A, and for CTA algorithms, P = A 
and Q = h, we know that PQ = A for both cases. Therefore, we get 

B = A T ® (I M - fiR u ) (112) 

We can reduce (11081 ) into the form (l32l . which is more suitable for our purposes, by introducing the 
eigen-decompositions of R u and A. Thus, let R u = UAU* denote the eigen-decomposition of R u , where 
U is unitary and A is diagonal with positive entries. Let also A = TDT^ 1 denote the eigen-decomposition 
of the real matrix A, where T is real and invertible and D is in the real Jordan canonical form [45 ], 
Bo*! . Then, the eigen-decomposition of B is given by 

B = (TDT~ 1 ) T ® [U(I M - l*A)U*] 
= (T~ T ® U)[D T ® (I M - M)](^ T ® U*) (113) 
and the eigen-decomposition of T is then given by 

T = {(T- J <g> U)[D T <g> (I M - M)](T T ® U*)} T ® {(T- J ® C/)^ 7 (Jm - M)](^ T ® U*)}* 
= X{[D T ® (I M - M)] T ® [-D T ® (/m - M)]*}*" 1 
= ^{[D® (is, - M)] ® [£> ® (hi - M)]}*- 1 

= ^(5®^ (114) 
where we used the facts that {D, A} are real and A is diagonal, and introduced the matrices: 

X = (T T ® U*) T ® (T T ® £7* )* (1 15) 

Q = D® (hi - fj,A) (116) 
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Then, from (|T09T>— (ITT5T >. we get 

X T vec(y T ) = [(T T ® U*) ® (T* (8 C/ T )] • fi 2 vec(Q T R v Q ® l£) 
= /i 2 vec [(T* U T )(Q T R V Q ® l£)(T ® ?7* T ) 
= fi 2 vec(T T Q T R V QT ® A) 
where we used the fact that T is real. Likewise, we get 

^vec^ <g> = [(T~ T ® «7) T ® (r~ T ® [/)*] • vec(£ fcfe ® i^) 
= vec(T~ 1 J E fcfc r- T ® A) 

Then, from (|114t - (|118t . the EMSE expression in (I108l l can be rewritten as 

EMSE fe « [vec(^ T )] T ^(/ JV 2 M2 - £ ® gy l X-\ec{E kk ® 

= fi 2 [vec(T T Q T R v QT ® A)] t (J^m= - £ ® ^)- 1 vec(T- 1 J E fcfc T-" r ® A) 
Using the fact that <5 in (II 161 ) is stable under Assumption |2j we can further obtain 

EMSE fc « /i 2 [vec(T T Q T J R i; Qr ® A)] T ( ^ ^ ® ^ ] ve^T' 1 E kk T- J ® A) 



(117) 



(118) 



(119) 



/'■ 



; [vec(r T Q T ^Qr ® A)] T ]T vec \gi (T' 1 E kk T~ T ® A)e Tj 



i=o 



00 

X; Tr [(r T Q T ^QT ® k)gi{T- x E kk T- J ® A)£ T ^ 



(120) 



Tr 



where we used the identities vec(ABC) = (C T ® A)vec(.B) and Tr(AB) = [vec(A T )] T vec(-B) for 
matrices {A,B,C} of compatible dimensions. From (II 151 ). we get 

(T T Q T R V QT ® h)Qi (T~ l E kk T~ J ® A)0 Ti 

= Tr {(T T Q T ^QT ® A)[I>> ® (J M - M^K^^fcfcT" 7 ® A)[D Tj ® (J M - 
= Tr ^T T Q T R v QTD j T- 1 E kk T- T D Tj ® A(/ A/ - fiA) j A(I M - M) j 

= Tr [a(/ m - M) j A(/m - M) j ® T T Q T R v QTD^ i T- 1 E kk T- T D T] 

M 

= - v\ m ) 2j Tr(T T Q T RvQTDiT-'EkkT- 7 D J i) 



m=l 
M 



J] \ 2 m {l - fi\ m ) 2j [vec(T T Q T R v QT)} T (Di ® D^veciT^E^T^) 



(121) 



m=l 
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where we used the identity Tr(A <g> B) = Tr(B <g> A) for square matrices {A, B} and the fact that 
A(I M - fJ,A) j A(I M - is diagonal. Substituting (|121|) back into (1120b leads to 

M 



EMSE fc « X 2 m [yec( y T T Q T R v QT)} T 



m=l 
M 



j=0 



vec{T- l E kk T- J ) 



V 2 ^ 2 mlvec(T T Q T R v QT)] T [I m - (1 - fi\ m ) 2 D ® D]- 1 vec(T- 1 £7 Jfefc T- T ) 



m=l 
M 



~V 2 Y1 ^ 2 mlvec(T T Q T R v QT)} T [I m - (1 - 2/jA m )£> $5 D^vecOT^^T- 7 ) (122) 

m=l 

where (1 — /uA m ) 2 wl - 2^A m due to Assumption [2] 

Appendix B 

Minimizing the Network Performance for Diffusion LMS 

To minimize the network EMSE for CTA given by (137T ). we introduce two auxiliary variables 77 and 9 
such that a + /? = 1 + T) and 1 - /3 = 0(1 - a), where -1 < 77 < 1 and 6* > 0. The network EMSE (|37]> 
can be rewritten as 



W A 2 

EMSE c ta « A'l E 7TT 

m=l ^ 



) 2 + 7 (i-e)(e- 7 ) 1 + 7 i + # 2 



(123) 



. 1 - 1 - £m?7 2 1 - £ m 7/ 2 _ 

where 7 = cr 2 2 /o" 2 i < 1 and < £ m < 1 is given by (l33l under Assumption |2] Minimizing expression 
(11231 ) in closed-form over both variables {9, 77} is generally non-trivial. We exploit the fact that the step- 
size is sufficiently small to help locate the values of 9 and 77 that approximately minimize the value of 
(I123I ). For this purpose, we first substitute (|33l into (11231 ) and use Assumption |2] to note that 

nTi(R u )a 2 vl 9 2 + 



EMSE C 



7 



(124) 



2 (1 + 0) 2 

Expression (I124l i writes the EMSE as the sum of two factors: the first factor is linear in the step-size and 
depends only on 9, and the second factor depends on higher-order powers of the step-size. For sufficiently 
small step-sizes, the first factor is dominant and we can ignore the second factor. Doing so allows us to 
estimate the value of 9 that minimizes (1 1 23b - Observe that the first factor on RHS of (11241 ) is minimized 
at 9° = 7 because 



7 2 + 9 2 > 29 j 



2 7 + 7 + 7 2 + 9 2 > 6> 2 7 + 7 + 29 j 



+ 7 



> 



7 



(125) 



(1 + 9) 2 ~ 1 + 7 

We now substitute 9° = 7 back into the original expression (11231 ) for the network EMSE to find that: 



M 



EMSE cta « fi 2 a 2 vA £ - 



\2 
A, 



m=l 



+ 7 



+ 



1 + 7' 



1 - U 2(1 - Uv 2 



(126) 
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We use this form to minimize the higher-order terms of \i over the variable 77. It is obvious that expression 
(11261 ) is minimized at rf = 0. The value of EMSE under 9° = 7 and rf = is then given by 

„2 M 



EMSE cta (0° = 7,^ = 0) Ka z vl 



7 nTt(R u ) 1 + 7 
1 + 7 2 2(1 + 7) , 

v ' m=l 



(127) 



Similarly, we can employ the same approximate argument to find that the solution (6°, rf) essentially 
minimizes the network MSD under Assumption |2j the corresponding value of the MSD is 

MSD cta( 0° = 7^ = 0) « al x \-l-\!*L + 1 + ^ 2 ^ Tr ^)l (128) 
y " ' ; u ' x [1 + 7 2 I + 7 2 J 

The solution {9° = 7,7/° = 0} translates into (I38T ). where < 7 < 1. 

In a similar manner, in order to minimize the network EMSE of ATC given by (l44l) . we introduce two 

auxiliary variables 77 and 9 such that a + j3 = 1 + 77 and 1 — /3 = 0(1 — a), where — 1 < 77 < 1 and 

9 > 0. Then, from (04]) we have 



EMSE atc « /x 2 ^ ! J]) 



Am 



1 



i e m L(i + #) 2 vi -e 



for which we can again motivate the selection {9° 
9° = 7 and 77 = is then given by 



) 2 +7 (l-fl)(fl-7) ,1 + 7 1 + 



1+7 



1,V 



1 - ^77 2 1 - £ m 77 2 

(129) 

0}. The value of the network EMSE at 



EMSE atc (0° = 7,77° = ())«<! - 



7 f/Ti(Ru) 



1 + 7 2 
Appendix C 

Derivation of EMSE for Block LMS Networks 
We start from (fl4l . To simplify the notation, we rewrite (JT) and (fl4l) as 

di = £7;^° + 

iOj = Wi-i + nU*{di - UiWi-i) 

where 

L7j = COl-Jui^l^} 

di = col{d 1 (i),d 2 {i)} 
Vi = col{v 1 (i),v 2 (i)} 

The error recursion is then given by 

m = (I M ~ nU*Ui)wi^i - fiU*Vi 



(130) 



(131) 
(132) 

(133) 
(134) 
(135) 

(136) 
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Let E be an arbitrary M X M positive semi-definite matrix that we are free to choose. Using (11361 ). we 
can evaluate the weighted square quantity = w*T,w.i. Doing so and taking expectations under 

Assumption \T\ we arrive at the following weighted variance relation |fT6l , iPffll : 



where 



E||«5i||| = E||«5i_i|||, + v 2 E\\U*Vi\\l 



E' = E(/ M - »U*Ui)X(I M - iiU*Ui 



(137) 



(138) 



where, in view of Assumption |2j we are dropping higher-order terms in fi. Let again R u = UAU* denote 
the eigen-decomposition of R u . We then introduce the transformed quantities: 



Wi = U Wi, 



S 4 U*T,U, 



Ui = UiU 



s' 4 u*t!u 



I = E||ie i _ 1 ||| +/ i 2 E||t7<w i ||| 



Relation dl37| ) is accordingly transformed into 

E[|«7 

where 

E~' « E - 2^AE - 2/iEA 



(139) 
(140) 

(141) 

(142) 



Since we are free to choose E, or equivalently, E, let E be diagonal and nonnegative. Then, it can be 
verified that E is also diagonal and nonnegative under Assumptions [T|-[3] so that 



E «(J M -4/iA)£ 

Under Assumption Q] the second term on the right-hand side of (|141|) evaluates to 



H 2 E\\U*Vi\\^ = ^Tr^EE/iEt/*)] 



(143) 



(144) 



where 



E UiTXJf =E 



ti 2 ,iEw^ i it 2 ,iEit2 i 



Tr(EA)/ 2 



(145) 



Therefore, we get 



// 2 E||C/*^|! = /x 2 Tr(i?^)Tr(EA) 



(146) 
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When the filter is mean-square stable, taking the limit as i — > oo of both sides of (11411 ) and selecting 

£ = Jm/4m, we get 

EMSE blk? , ^ Tr f" ) < 1 +< 2 (147) 
Likewise, by selecting £ = A~ 1 /4^ and taking the limit of both sides of (11411 ) as i — > oo, we arrive at 

MSD blk ^<i±<^ (148) 
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